Corpus Development Activities at the Center for Spoken Language Understanding

نویسندگان

  • Ronald A. Cole
  • Mike Noel
  • Daniel C. Burnett
  • Mark A. Fanty
  • Terri Lander
  • Beatrice T. Oshika
  • Stephen Sutton
چکیده

This paper describes eight telephone-speech corpora at various stages of development at the Center for Spoken Language Understanding. For each corpus, we describe data collection procedures, methods of soliciting callers, protocol used to collect the data, transcriptions that accompany the speech data, and the expected release date. The corpora are available at no charge to academic institutions.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Impact of Language Learning Activities on the Spoken Language Development of 5-6-Year-Old Children in Private Preschool Centers of Langroud

The Impact of Language Learning Activities on the Spoken Language Development of 5-6-Year-Old Children in Private Preschool Centers of Langroud N. Bagheri, M.A. E. Abbasi, Ph.D. M. GeramiPour, Ph.D. The present study was conducted to investigate the impact of language learning activities on development of spoken language in 5-6-year-old children at private preschool center...

متن کامل

Telephone Speech Corpus Development at Cslu

This paper describes eight telephone-speech corpora at various stages of development at the Center for Spoken Language Understanding. For each corpus we describe data collection procedures, methods of soliciting callers, protocol used to collect the data, transcriptions that accompany the speech data, and the expected release date. The corpora are (or will be) available at no charge to academic...

متن کامل

پیکره اعلام: یک پیکره استاندارد واحدهای اسمی برای زبان فارسی

Named entity recognition (NER) is a natural language processing (NLP) problem that is mainly used for text summarization, data mining, data retrieval, question and answering, machine translation, and document classification systems. A NER system is tasked with determining the border of each named entity, recognizing its type and classifying it into predefined categories. The categories of named...

متن کامل

Design and Data Collection for Spoken Polish Dialogs Database

Spoken corpora provide a critical resource for research, development and evaluation of spoken dialog systems. This paper describes the telephone spoken dialog corpus for Polish created by Polish-Japanese Institute of Information Technology team within the LUNA project (IST 033549). The main goal of this project is to create a robust natural spoken language understanding (SLU) toolkit, which can...

متن کامل

Annotations and Tools for an Activity Based Spoken Language Corpus

The paper contains a description of the Spoken Language Corpus of Swedish at the Department of Linguistics, Göteborg University (GSLC), and a summary of the various types of analysis and tools that have been developed for work on this corpus. Work on the corpus was started in the late 1970:s. It is incrementally growing and presently consists of 1.3 million words from about 25 different social ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1994